Goto

Collaborating Authors

 Gulf of Mexico


A Quantifiable Information-Processing Hierarchy Provides a Necessary Condition for Detecting Agency

Kagan, Brett J., Baccetti, Valentina, Earp, Brian D., Boyd, J. Lomax, Savulescu, Julian, Razi, Adeel

arXiv.org Machine Learning

As intelligent systems are developed across diverse substrates - from machine learning models and neuromorphic hardware to in vitro neural cultures - understanding what gives a system agency has become increasingly important. Existing definitions, however, tend to rely on top-down descriptions that are difficult to quantify. We propose a bottom-up framework grounded in a system's information-processing order: the extent to which its transformation of input evolves over time. We identify three orders of information processing. Class I systems are reactive and memoryless, mapping inputs directly to outputs. Class II systems incorporate internal states that provide memory but follow fixed transformation rules. Class III systems are adaptive; their transformation rules themselves change as a function of prior activity. While not sufficient on their own, these dynamics represent necessary informational conditions for genuine agency. This hierarchy offers a measurable, substrate-independent way to identify the informational precursors of agency. We illustrate the framework with neurophysiological and computational examples, including thermostats and receptor-like memristors, and discuss its implications for the ethical and functional evaluation of systems that may exhibit agency.


Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Nair, Arjun S.

arXiv.org Machine Learning

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at rank 32, we reach 11,699 tokens/second versus Unsloth MAX's 2,857 tokens/second (4.10x). Critically, we discovered that Unsloth's reported 46,000 tokens/second benchmark exhibited zero gradient norms--the model was not training. We provide complete mathematical foundations: online softmax correctness proofs, FlashAttention IO complexity bounds O(N^2 d^2 M^{-1}), LoRA+ learning rate derivations from gradient magnitude analysis, and bin-packing approximation guarantees. All implementations, benchmarks, and proofs are available at https://github.com/Ajwebdevs/Chronicals with pip installation via https://pypi.org/project/chronicals/.


Supplementary Material for Paper " Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs " A Criteria for Node Equality When Merging Traces

Neural Information Processing Systems

TraceGraph, it compares the type, attributes, and the executed location of each operation. For example, the MatMul operation of TensorFlow has ' MatMul ' as GraphGenerator fails to match because of the different attributes. The pushed call id is popped when the function is returned. As same as the call id stack, Terra manages the loop id stack for the entire program execution. Current implementation of Terra does not consider multi-threading yet.


Neural Proximal Gradient Descent for Compressive Imaging

Neural Information Processing Systems

Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information. Learning a good inverse mapping from training data faces severe challenges, including: (i) scarcity of training data; (ii) need for plausible reconstructions that are physically feasible; (iii) need for fast reconstruction, especially in real-time applications. We develop a successful system solving all these challenges, using as basic architecture the repetitive application of alternating proximal and data fidelity constraints. We learn a proximal map that works well with real images based on residual networks with recurrent blocks. Extensive experiments are carried out under different settings: (a) reconstructing abdominal MRI of pediatric patients from highly undersampled k-space data and (b) super-resolving natural face images. Our key findings include: 1. a recurrent ResNet with a single residual block (10-fold repetition) yields an effective proximal which accurately reveals MR image details. 2. Our architecture significantly outperforms conventional non-recurrent deep ResNets by 2dB SNR; it is also trained much more rapidly.


On the Universal Representation Property of Spiking Neural Networks

Hundrieser, Shayan, Tuchel, Philipp, Kong, Insung, Schmidt-Hieber, Johannes

arXiv.org Machine Learning

Inspired by biology, spiking neural networks (SNNs) process information via discrete spikes over time, offering an energy-efficient alternative to the classical computing paradigm and classical artificial neural networks (ANNs). In this work, we analyze the representational power of SNNs by viewing them as sequence-to-sequence processors of spikes, i.e., systems that transform a stream of input spikes into a stream of output spikes. We establish the universal representation property for a natural class of spike train functions. Our results are fully quantitative, constructive, and near-optimal in the number of required weights and neurons. The analysis reveals that SNNs are particularly well-suited to represent functions with few inputs, low temporal complexity, or compositions of such functions. The latter is of particular interest, as it indicates that deep SNNs can efficiently capture composite functions via a modular design. As an application of our results, we discuss spike train classification. Overall, these results contribute to a rigorous foundation for understanding the capabilities and limitations of spike-based neuromorphic systems.


Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees

Banerjee, Bisakh, Alwardat, Mohammad, Maiti, Tapabrata, Aviyente, Selin

arXiv.org Machine Learning

Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular graph under the presumption that the observed data are uniform. However, many contexts involve heterogeneous datasets that feature multiple closely related graphs, typically referred to as multiview graphs. Previous research on multiview graph learning promotes edge-based similarity across layers using pairwise or consensus-based regularizers. However, multiview graphs frequently exhibit a shared node-based architecture across different views, such as common hub nodes. Such commonalities can enhance the precision of learning and provide interpretive insight. In this paper, we propose a co-hub node model, positing that different views share a common group of hub nodes. The associated optimization framework is developed by enforcing structured sparsity on the connections of these co-hub nodes. Moreover, we present a theoretical examination of layer identifiability and determine bounds on estimation error. The proposed methodology is validated using both synthetic graph data and fMRI time series data from multiple subjects to discern several closely related graphs.


Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

Cheng, Shihan, Kulkarni, Nilesh, Hyde, David, Smirnov, Dmitriy

arXiv.org Artificial Intelligence

Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learns these controls from sparse, low-quality synthetic data. W e show that not only does fine-tuning on such simple data enable the desired controls, it actually yields superior results to models fine-tuned on pho-torealistic "real" data. Beyond demonstrating these results, we provide a framework that justifies this phenomenon both intuitively and quantitatively.


Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation

Parashchuk, Dmitriy, Kapshitskiy, Alexey, Karyakin, Yuriy

arXiv.org Artificial Intelligence

Automatic 3D reconstruction of indoor spaces from 2D floor plans necessitates high-precision semantic segmentation of structural elements, particularly walls. However, existing methods often struggle with detecting thin structures and maintaining geometric precision. This study introduces MitUNet, a hybrid neural network combining a Mix-Transformer encoder and a U-Net decoder enhanced with spatial and channel attention blocks. Our approach, optimized with the Tversky loss function, achieves a balance between precision and recall, ensuring accurate boundary recovery. Experiments on the CubiCasa5k dataset and a proprietary regional dataset demonstrate MitUNet's superiority in generating structurally correct masks with high boundary accuracy, outperforming standard models. This tool provides a robust foundation for automated 3D reconstruction pipelines. To ensure reproducibility and facilitate future research, the source code and the proprietary regional dataset are publicly available at https://github.com/aliasstudio/mitunet and https://doi.org/10.5281/zenodo.17871079 respectively.


The Impossibility of Inverse Permutation Learning in Transformer Models

Alur, Rohan, Hays, Chris, Raghavan, Manish, Shah, Devavrat

arXiv.org Artificial Intelligence

In this technical note, we study the problem of inverse permutation learning in decoder-only transformers. Given a permutation and a string to which that permutation has been applied, the model is tasked with producing the original (``canonical'') string. We argue that this task models a natural robustness property across a variety of reasoning tasks, including long-context retrieval, multiple choice QA and in-context learning. Our primary contribution is an impossibility result: we show that an arbitrary depth, decoder-only transformer cannot learn this task. This result concerns the expressive capacity of decoder-only transformer models and is agnostic to training dynamics or sample complexity. We give a pair of alternative constructions under which inverse permutation learning is feasible. The first of these highlights the fundamental role of the causal attention mask, and reveals a gap between the expressivity of encoder-decoder transformers and the more popular decoder-only architecture. The latter result is more surprising: we show that simply padding the input with ``scratch tokens" yields a construction under which inverse permutation learning is possible. We conjecture that this may suggest an alternative mechanism by which chain-of-thought prompting or, more generally, intermediate ``thinking'' tokens can enable reasoning in large language models, even when these tokens encode no meaningful semantic information (e.g., the results of intermediate computations).


NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization

Zhang, Gaorui, Yuan, Zhizhang, Yang, Jialan, Chen, Junru, Meng, Li, Yang, Yang

arXiv.org Artificial Intelligence

Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing and deep learning methods to enhance neural decoding performance. However, the in-depth exploration of model architectures remains underexplored, despite its proven effectiveness in other tasks such as energy forecasting and image classification. In this study, we propose NeuroSketch, an effective framework for neural decoding via systematic architecture optimization. Starting with the basic architecture study, we find that CNN-2D outperforms other architectures in neural decoding tasks and explore its effectiveness from temporal and spatial perspectives. Building on this, we optimize the architecture from macro- to micro-level, achieving improvements in performance at each step. The exploration process and model validations take over 5,000 experiments spanning three distinct modalities (visual, auditory, and speech), three types of brain signals (EEG, SEEG, and ECoG), and eight diverse decoding tasks. Experimental results indicate that NeuroSketch achieves state-of-the-art (SOTA) performance across all evaluated datasets, positioning it as a powerful tool for neural decoding. Our code and scripts are available at https://github.com/Galaxy-Dawn/NeuroSketch.